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Abstract 

In  this  thesis,  a  methodology  is  developed  to  experimentally  test  and  evaluate  a 
programmable  logic  device  under  gamma  irradiation.  The  purpose  of  which  is  to 
determine  the  radiation  effects  and  characterize  the  improvements  of  various  hardening 
by  design  techniques.  The  techniques  analyzed  in  this  thesis  include  Error  Correction 
Coding  (ECC)  and  Triple  Modular  Redundancy  (TMR). 

The  TMR  circuit  includes  three  different  functional  implementations  of  adders 
compared  to  TMR  voted  circuits  of  those  same  adders.  The  TMR  is  implemented  with 
the  same  functional  adders  and  as  a  Functional  TMR  (FTMR)  with  three  different 
function  adders  that  are  voted  on.  The  three  functional  adders  are:  a  behavioral  adder  that 
allows  the  FPGA  synthesis  software  to  create  the  implementation,  a  ripple  carry  adder 
that  consists  of  multiple  single  bit  full  adders  linked  together,  and  a  carry  look  ahead 
adder  that  operates  the  fastest  by  using  an  algorithm  that  creates  generate  and  propagate 
signals.  These  adders  are  connected  to  single  voter  TMR  and  FTMR  circuits  to  evaluate 
the  improvements  that  could  be  obtained. 

The  ECC  circuit  includes  Block  RAM  (BRAM)  and  Distributed  RAM  memory 
elements  that  are  loaded  both  with  ECC  and  non-error  corrected  data.  The  circuit  is 
designed  to  check  for  errors  in  memory  data,  stuck  bit  values  in  the  memory,  and  the 
performance  improvements  that  ECC  provides  the  system. 
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The  results  show  that  TMR  or  FTMR  circuits  failed  at  a  rate  at  or  above  the  single 
copy  adders.  This  results  from  the  single  point  of  failure  created  by  the  voting  logic 
being  in  the  radiation  environment.  However,  when  the  TMR  or  FTMR  circuit  is  moved 
off-chip,  the  TMR  single  point  of  failure  is  removed  and  the  results  demonstrate  much 
lower  SEU  error  rates. 
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CHARACTERIZATION  OF  HARDENING  BY  DESIGN  TECHNIQUES  ON 
COMMERCIAL,  SMALL  FEATURE  SIZED  FIELD-PROGRAMMABLE  GATE 

ARRAYS 


I.  Introduction 


1.1  Chapter  Overview 

This  chapter  covers  the  following  topics: 

1.  Motivation 

2.  Problem  Statement 

3.  Plan  of  Attack 

3.  Contributions 

4.  Sequence  of  Presentation 

1.2  Motivation 

Space  and  terrestrial  radiation  sources  are  known  to  cause  errors  and  malfunctions 
in  integrated  circuit  designs.  These  effects  from  subatomic  particles  and  ionizing 
radiation  on  integrated  circuits  are  referred  to  as  Single  Event  Effects  (SEE).  These 
effects  can  cause  sequential  and  combinational  elements  of  the  integrated  circuit  to 
change  states  or  values.  The  traditional  method  of  minimizing  these  effects  in  space 
environments  is  to  use  radiation  hardened  application  specific  integrated  circuits  and 
programmable  logic  devices.  However,  these  devices  often  require  large  lead  times  and 
cost  orders  of  magnitude  more  than  non-radiation  hardened  devices.  Therefore,  various 


1 


organizations  are  investigating  using  commercially  available  circuits  in  these  harsh 
environments  in  order  to  reduce  time  and  budget. 

1.3  Problem  Statement 

State  of  the  art  systems  are  increasingly  being  developed  on  Field  Programmable 
Gate  Arrays  (FPGAs),  due  to  their  cost  and  schedule  performance  benefits  over 
traditional  Application  Specific  Integrated  Circuits  (ASICs).  However,  with  newer 
FPGAs  with  design  features  at  90nm  and  below,  radiation  effects  in  both  space  and  some 
terrestrial  environments  limit  the  effective  use  of  FPGAs.  These  effects  often  lead  to  the 
use  of  radiation  hardened  devices  to  limit  these  harmful  effects. 

Various  designs  work  to  make  FPGAs  less  susceptible  to  radiation  and  offer 
increased  reliability  over  standard  FPGA  designs.  These  design  processes  are  called 
hardening  by  design.  The  objective  of  this  work  is  to  characterize  these  improvements  so 
that  these  designs  can  be  used  in  non-critical  space  applications  that  traditionally  require 
radiation  hardened  FPGAs.  Thus,  the  experimental  design: 

1.  Evaluates  the  radiation  effects  on  various  hardened  designs 

2.  Allows  for  analysis  of  feilures 

3.  Allows  for  characterization  of  hardening  by  design  techniques  versus 

traditional  non- hardened  designs 

The  specific  goal  of  this  research  is  to  evaluate  and  characterize  hardening  by 
design  techniques  on  90nm  FPGA  circuits.  This  facilitates  replacement  of  physically 
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hardened  ASIC  and  FPGAs,  as  well  as  allow  for  improvements  in  designs  of  non¬ 
radiation  hardened  electronics. 

This  research  shows  whether  design  hardening  techniques  such  as  Triple  Modular 
Redundancy  (TMR),  F unctional  Triple  Modular  Redundancy  (FTMR)  and  Error 
Correction  Coding  (ECC)  can  reduce  system  vulnerability  to  ionizing  radiation 

1.4  Contributions 

This  thesis  explores  the  effects  of  gamma  radiation  on  different  FPGA 
programming  styles  in  an  attempt  to  mitigate  these  effects  on  non  radiation  hardened 
FPGAs.  The  contributions  of  this  work  include  an  analysis  of  commercial  off  the  shelf 
reconfigurable  electronics  in  radiation  environments.  This  is  superior  to  the  current  use 
of  radiation  hardened  devices  in  space  environments.  The  contributions  of  this  work 
include: 

1 .  Successful  design  of  a  system  for  sending,  receiving  and  analyzing  data  from 
an  FPGA  device  under  radiation. 

2.  Characterization  of  design  hardening  techniques  versus  standard  FPGA 
programming.  Design  hardening  techniques  include  TMR,  FTMR  and  ECC. 

3.  An  evaluation  of  design  hardening  techniques  tested,  including  error  locations, 
causes,  and  performance  improvements. 

1.5  Sequence  of  Presentation 

The  remainder  of  this  thesis  is  divided  in  to  five  chapters  followed  by  supporting 

appendices.  Chapter  1  provides  motivation,  a  basic  problem  statement,  a  plan  of  attack, 
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and  contributions.  Chapter  2  provides  background  information  relevant  to  developing 
radiation  hardened  desiga  Chapter  3  covers  the  methodology  used  to  design  and  test  the 
radiation  hardened  desiga  Chapter  4  covers  results  of  the  characterization  of  the  design 
improvements  and  Chapter  5  provides  conclusions  and  discussion  on  future  work.  The 
appendices  contain  information  considered  too  lengthy  to  include  in  the  main  body  of  the 
text  but  which  provide  additional  information  for  those  interested  parties.  This 
information  includes  the  wiring  setup,  software  code  used  for  testing,  and  the  raw  data 
obtained  from  the  irradiations. 
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II.  Literature  Review 


2.1  Chapter  Overview 

This  chapter  covers: 

1.  Basic  Radiation  Effects  on  Electronics 

2.  Field  Programmable  Gate  Arrays 

3.  Related  work 

2.2  Basic  Radiation  Effects  on  Electronics 

This  section  covers  basic  definitions,  the  radiation  source  for  experimentation, 

and  the  effects  of  ionizing  radiation  on  a  circuit. 

2.2.1  Definitions 

Understanding  radiation  effects  on  electronics  requires  an  understanding  of  a  few 

basic  terms  (Radiation  Effects  &  Analysis  Home  Page): 

•  Ionizing  Radiation  -  Electromagnetic  radiation  that  has  enough  energy  to  overcome 
the  binding  of  electrons  in  atoms  or  molecules. 

•  Single  Event  Effect  (SEE)  -  Any  measurable  effect  to  a  circuit  due  to  ion  strikes. 

•  Single  Event  Transient  (SET)  -  A  voltage  pulse  through  a  circuit  caused  by  ion 
strikes. 

•  Single  Event  Upset  (SEU)  -  A  change  of  state  induced  by  an  ionization  damage  to  a 
circuit.  SEUs  are  soft  errors  that  a  reset  or  rewriting  of  the  device  will  cause  normal 
device  behavior. 
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•  Multiple  Bit  Upset  (MBU)  -  An  event  induced  by  a  single  energetic  particle  that 
causes  multiple  upsets  or  transients  during  its  path  through  a  device  or  system. 

•  Single  Hard  Error  (SHE)  -  An  SEU  which  causes  a  permanent  change  to  the  operation 
of  a  device.  An  example  is  a  stuck  bit  in  a  memory  device. 

•  Single  Event  Latchup  (SEL)  -  A  condition  which  causes  loss  of  device  functionality 
due  to  a  single  event  induced  high  current  state.  An  SEL  may  or  may  not  cause 
permanent  device  damage,  but  requires  power  strobing  of  the  device  to  resume 
normal  device  operations. 

•  Single  Event  Burnout  (SEB)  -  A  condition  which  can  cause  device  destruction  due  to 
a  high  current  state  in  a  power  transistor. 

•  Single  Event  Gate  Rupture  (SEGR)  -  A  single  bn  induced  condition  in  power 
transistors  which  may  result  in  the  formation  of  a  conducting  pa  th  in  the  gate  oxide. 

2.2.2  Radiation  Source 

The  cobalt-60  isotope  (Co-60)  is  used  as  the  source  of  ionizing  radiation  for  this 
experiment.  Co-60  undergoes  beta  decay  with  a  half-  life  of  5.24  years  releasing  two 
gamma  particles  and  one  electron,  illustrated  in  Figure  2.1. 
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Figure  2.1 :  Co-60  Decay  Emitting  1  Electron  and  2  Gammas 


Figure  2.2:  Co-60  Gamma  Irradiator  Layout 
A  Co-60  source  is  available  at  the  Ohio  State  University(OSU)  Nuclear  Reactor 
Lab(NRL)  in  Columbus,  Ohio.  This  gamma  irradiator  is  shown  in  Figure  2.2.  It  contains 
a  six  inch  wide  aluminum  tube  containing  a  movable  platform  that  can  be  raised  and 
lowered  out  of  the  irradiator.  The  gamma  irradiator  cell  itself  sits  on  the  bottom  of  a  poo  1 
of  water  and  consists  of  14  Co-60  sources  evenly  spread  around  the  aluminum  tube. 
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Co-60  Irradiator:  Jan  1,  2009 


Figure  2.3:  Dose  Rate  of  Co-60  Irradiator  (Herminghuysen) 

When  the  device  under  test  (DUT)  is  lowered  into  the  tube,  the  radiation  dose  is  based  on 
the  location  of  the  device  relative  to  the  center  of  the  Co-60  source  rods.  However,  the 
dose  curve  is  based  on  the  distance  of  the  DUT  above  the  bottom  of  the  moveable 
platform  when  the  platform  is  resting  onthe  bottom  ofthe  aluminum  tube.  An  example 
radiation  dose  curve  is  depicted  in  Figure  2.3. 

2.2.3  Ionizing  Radiation  Effects  on  Electronics 

Ionizing  Radiation  creates  electron  hole  pairs  in  materials  by  freeing  electrons 
from  the  atoms  or  molecules  that  they  are  bonded  to.  When  this  occurs  in  electrically 
conductive  materials,  these  electrons  are  free  to  quickly  move  back  to  lowest  energy 
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states  thus  recombining  the  electrons  with  available  holes  almost  instantly.  However,  in 
nonconductive  materials,  such  as  gate  oxides  in  nmos  transistors,  the  electron  hole  pairs 
take  longer  to  recombine.  The  electrons,  having  a  higher  mobility  than  holes,  are  then 
drawn  from  the  oxide  leaving  a  positive  charge  in  the  oxide.  When  power  is  applied  to 
the  gate  on  nmos  devices,  these  holes  with  a  positive  charge  are  pushed  toward  the  gate 
interface  with  the  substrate.  This  is  a  result  of  both  the  decrease  in  the  distance  between 
the  gate  charge  and  the  substrate  and  the  increased  positive  charge  on  the  gate.  In  fact,  if 
enough  charge  builds  in  the  transistor’s  gate  oxide  the  NMOS  circuit  can  turn  on  without 
a  charge  applied  to  the  gate  input,  effectively  shorting  the  transistor.  Figure  2.4  shows  the 
transistor  after  irradiatioa 

According  to  hole-trapping  models,  hole-traps  are  formed  transistor  gate 
oxides.  If  there  is  a  positive-bias  applied  to  an  n-channel  CMOS  device,  electrons  are 
quickly  swept  out  of  the  oxide  in  less  than  a  pico-second  due  to  the  higher  mobility  of 
electrons  compared  to  holes.  Some  electrons  will  recombine  with  the  holes.  However, 
this  varies  depending  on  the  electric  field  and  the  ionizing  source.  The  holes  are  relatively 
immobile  compared  to  the  electrons  and  can  cause  a  temporary  negative  threshold 
voltage  shift. 


Figure  2.4:  Result  of  Ionizing  Radiation:  Formationof  a  Shorting  Path  be  tween  the 
Source  and  Drain  ofNMOS  Transistor  (Arnold) 
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Depending  on  the  applied  electric  field,  the  temperature,  oxide  thickness,  and  fabrication 
techniques  the  holes  will  slowly  migrate  toward  the  oxide -substrate  interface  bypolaron 
hopping  (Petrosky;  Rollins,  Wirthlin  and  Graham) 

2.3  Field  Programmable  Gate  Arrays 

Field  Programmable  Gate  Arrays  (FPGAs)  are  a  type  ofcircuit  that  is 
programmed  in  the  field  rather  than  in  a  semiconductor  lubrication.  It  consists  of 
programmable  interconnects  in  the  circuit  that  allow  the  connection  of  various  gates  and 
structures.  These  interconnects  require  a  large  amount  of  FPGA  area  resulting  in  a  chip 
with  very  low  gate  density  compared  to  Application  Specific  Integrated  Circuits  (ASICs.) 
The  vast  majority  of  FPGAs  are  SRAM-based,  although  there  are  some  flash  and  antifuse 
versions.  Typically,  the  antifuse  varieties  are  of  interest  to  aerospace  designers  because 
they  are  more  radiation  hardened.  However,  their  increased  costs  reduce  the  advantages 
of  using  FPGA  over  ASICs.  Therefore,  more  and  more  often  designers  are  looking  to  use 
non- hardened  SRAM  FPGAs  inplace  of  these  design  hardened  devices. 

FPGAs  are  currently  manufactured  by  several  manufacturers  with  Xilinx  and 
Altera  dominating  the  market.  Each  manufacturer  also  has  their  own  unique  computer 
based  programming  tools  for  use  with  their  specific  FPGAs.  The  Xilinx  package  of 
various  FPGA  development  tools  is  the  Integrated  Synthesis  Environment  (ISE)  Design 
Suite.  This  suite  includes  many  tools  with  two  main  programming  environments  -  ISE 
Foundation  and  Xilinx  Platform  Studio  (XPS).  The  ISE  tool  is  used  primarily  for  Very- 
High-Speed  Integrated  Circuits  (VHSIC)  Hardware  Design  Language  (VHDL) 
implementations.  The  XPS  tool  is  used  for  implementing  Intellectual  Property  (IP)  cores 
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such  as  embedded  processor  designs  like  the  MicroBlaze  soft  core  and  PowerPC 
microprocessors.  ISE  VHDL  projects  can  be  implemented  as  IP  cores  inside  XPS  to 
integration  of  user  created  cores  into  XPS  software  (Xilinx  Inc) 

SRAM  based  FPGAs  consist  of  control  logic  routing  the  devices  in  the  FPGA 
fabric  together.  An  example  of  the  SRAM  cell  that  is  the  basic  structure  that  makes  up 
the  FPGA  is  shown  in  Figure  2.5. 

2.4  Related  Work 

Two  main  organizations  that  are  heavily  involved  in  radiation  effects  research  are 
Los  Alamos  National  Laboratories  and  NASA  Goddard  Space  Flight  Center.  They  are 
two  key  players  in  radiation  effects  on  circuits.  Therefore,  many  of  the  papers  in  this 
section  come  from  their  research  and  research  that  they  suppor  t  in  this  field. 
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Figure  2.5 :  Basic  6  Transistor  SRAM  Structure  in  FPGA 


11 


2.4.1  Hardening  by  Design  Research  and  Simulations 

Significant  research  exists  on  hardening  by  design  techniques.  This 
predominantly  includes  fault  injection  analysis  of  various  hardened  designs.  Previous 
work  ranges  from  simple  fault  injection  analysis  of  various  TMR  and  Error  Detection  and 
Correction  (ED AC)  designs  to  proposals  for  more  advanced  design  hardening  techniques. 

An  example  of  TMR  fruit  injection  shows  TMR  voting  on  incrementer  and 
counters  designs.  Counter  intuitively,  the  results  show  the  single  voter  TMR  can  produce 
results  as  bad  as  or  worse  than  a  single  version  of  a  counter  without  redundancy.  This  is 
a  result  of  the  single  point  of  failure  inherent  in  the  TMR  design  (Rollins,  Wirthlin  and 
Graham).  Table  2.1  shows  the  results  of  fault  injection  on  four  different  TMR  designs. 
This  single  point  of  failure  can  be  fixed  by  various  techniques  including  using  feedback, 
final  TMR  voting  off  chip,  or  utilizing  check  voters  which  validate  the  results  of  the 
majority  TMR  voters  (Rollins,  Wirthlin  and  Graham).  Additionally,  research 
demonstrates  various  EDAC  techniques  including  ECC  and  more  advanced  techniques. 
One  useful  technique  onFPGAs  is  called  Lightweight  EDAC  (LED  AC).  LED  AC 
implements  array  based  code  compared  to  traditional  word  or  line  encoding  schemes. 

This  allows  for  much  greater  error  detection  and  correction  than 
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Table  2.1 :  TMR  Simulation  Results  (Rollins,  Wirthlin  and  Graham) 


Design 
(single  clock) 

Simple  In  ere  in  enter 

LUTs  Failure?  Speed  (MHz) 

i  Up /Down  Loadable  Counter 
i  LUTs  Failures  Speed  (MHz) 

No  Redund  an  c  y 

& 

446 

220 

10 

463 

220 

1  Voter 

35  U--4x) 

410 

217  (99%) 

iKJBE99 

484 

217  (99%) 

3  Voters 

51  6x) 

14 

199  (91%) 

57  (---Gx) 

14 

Feedback 

51  6x) 

14 

160  (73%) 

iKEgjgtRE 

15 

Map  Feedback 

27  (~ 3x) 

15 

194  (S8%) 

N/A 

ECC  with  less  overhead  (Karl,  S  amson  and  C  lark).  This  scheme  is  ide  al  for  FPGAs  due 
to  their  high  level  of  multi-parallelism,  meaning  this  EDAC  could  function  with  no 
existing  FPGA  hardware. 

2.4.2  FPGA  Radiation  Analysis 

Radiation  testing  of  the  newest  radiation  hardened  and  non-radiation  hardened 
FPGAs  are  on-going  for  space  and  terrestrial  applications.  This  research  indicates  that 
the  primary  SEUs  occur  in  the  logic  memory  of  the  SRAM  FPGA.  These  papers 
demonstrate  that  configuration  memory  and  Input/Output  (I/O)  pads  are  less  likely  to 
have  SEUs  effect  the  outputs  of  the  FPGA  since  they  require  multiple  bits  to  be  changed 
in  order  for  an  error  occur  (Ceschia,  Violante  and  Reorda).  Additionally,  dose  ranges  on 
radiation  hardened  devices  are  shown  to  meet  the  minimum  radiation  hardened  standard 
of  greater  than  300  krad.  Doses  on  non- radiation  hardened  devices  are  generally  an  order 
of  magnitude  lower  (Brown  and  Brewer). 

Single  Event  Upsets  at  ground  level  are  also  observed  showing  the  need  to  add 
hardening  by  design  techniques  to  some  safety  critical  applications  in  terrestrial 
environments  as  well  (Claeys  and  Simoen). 
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2.5  Chapter  Summary 

This  chapter  discusses  relevant  radiation  effects  on  electronics,  some  basic  FPGA 
information  and  a  summary  of  some  related  research  in  the  field  of  radiation  effects  of 
FPGAs  and  specifically  hardening  by  design  techniques. 
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III.  Methodology 


3.1  Chapter  Overview 

This  chapter  discusses  the  methodology  for  analysis  of  the  hardening  by  radiation 
techniques.  The  materials  covered  in  this  chapter  include  the  test  design,  the 
experimental  setup,  and  the  format  ofthe  data  for  analysis.  The  test  design,  covered  in 
Section  3.2,  describes  the  hardening  by  design  techniques.  Section  3.3  discusses  how  the 
hardware  and  software  were  designed  for  the  experiment.  Section  3.4  discusses  the  test 
plan  including  how  the  data  is  received  for  the  radiation  experiment  and  how  that  data  is 
analyzed  to  produce  results.  Finally,  Section  3.5  summarizes  the  Chapter. 

3.2  Design 

The  design  is  setup  to  test  the  effects  of  Triple  Modular  Redundancy  (TMR), 
Functional  TMR  (FTMR),  and  Error  Correction  Coding  (ECC). 

3.2.1  Redundant  Circuits  with  Voting 

There  are  several  methods  of  voting  on  redundant  logic  or  functionally  redundant 
logic  in  order  to  reduce  the  effects  of  SEUs  on  the  outputs  of  a  logic  module.  The  most 
common  of  these  is  the  use  of  triple  modular  redundancy,  shown  in  Figure  3.1,  to  mask 
faults.  This  technique  triplicates  all  inputs  and  logic  and  passes  the  results  to  a  bit-wise 
voter  unit  that  takes  the  majority  result  to  provide  an  output. 
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Figure  3.1 :  Simple  Triple  Modular  Redundancy  Voter 

Single  voter  TMR  systems  can  result  in  a  single  point  of  failure.  Thus,  several 
techniques  have  been  developed  to  mitigate  the  errors  caused  by  the  single  point  of 
failure  in  a  TMR  system  These  techniques  include  three  voter  TMRs,  word  wise  TMRs 
and  buffered  TMRs. 

Three  voter  TMRs  provide  three  separate  copies  of  the  outputs,  which  greatly 
reduces  the  errors  from  the  single  point  of  failure  of  a  single  voter  system.  This  system  is 
then  eventually  passed  to  a  single  voter  for  the  final  result.  However,  this  step  adds  much 
less  relative  error  than  voting  at  each  intermediate  step. 

Word- wise  TMRs  involve  forcing  the  FPGA  configuration  to  vote  on  the  output 
value  in  multiple  bit  sections  rather  than  bit  by  bit  which  is  the  default  for  voter  logic. 

This  word  wise  voting  has  been  shown  to  decrease  errors  in  simulations. 
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Another  approach  involves  the  use  of  buffer  based  TMRs  which  has  been  shown 


to  reduce  the  SEU  effects  in  the  voter  logic  in  simulations  An  example  of  a  buffer  based 
TMR  is  shown  in  Figure  3.2. 

The  thesis  analyzes  the  effects  on  single  voter  TMRs  versus  different  functional 
implementations.  Additionally,  it  investigates  improvements  possible  by  using  a 
Functional  TMR  that  takes  a  vote  on  three  functional  implementations  instead  of  three 
copies  of  the  same  implementation.  To  show  the  improvements  of  triplicating  TMR 
logic,  the  control  board  counts  and  outputs  the  results  from  each  TMR  for  analysis  of 
improvements  of  a  final  TMR  that  is  placed  off-chip.  Due  to  constraints  of  the  serial 
output  to  the  PC,  a  total  count  of  the  control  board  TMR  errors  is  used  for  analysis. 

3.2.2  Error  Detection  and  Correction  Coding 

There  is  a  wide  range  of  error  detection  and  correction  techniques  that  are  used  to 
protect  memory  data.  These  techniques  range  from  simple  error  detection  schemes,  such 
as  parity  checks,  to  the  more  advanced  error  correcting  code,  some  of  which  are  currently 
used  in  memory  systems.  These  error  detection  and  correction  techniques  have 


LUT  Voter 


TBUF  Voter 


Figure  3.2:  Single  Bit  of  LUT  vs.  Buffered  TMR  Voter 
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add  itiona  1  data  stored  in  memory  in  order  to  allow  for  error  detection.  Then  if  errors  are 
detected,  the  data  is  corrected  by  either  resending  the  data  from  its  source  or  by  including 
sufficient  information  in  the  data  to  allow  it  to  correct  itself. 

One  ofthe  most  common  types  of  ECC  memory  involves  the  use  of  Hamming 
Code.  Even  though  a  single  cosmic  ray  can  upset  many  physically  neighboring  bits  in  a 
memory  system,  such  memory  systems  are  designed  so  that  neighboring  bits  belong  to 
different  words,  so  that  an  SEU  causes  only  a  single  error  in  any  particular  word,  and  can 
be  corrected  by  a  single-bit  error  correcting  code.  As  long  as  not  more  than  a  single  bit  in 
any  particular  word  is  affected  by  an  error  between  accesses,  a  memory  system  presents 
the  illusion  of  an  error- free  memory.  The  Hamming  Code  is  based  additional  memory 
spaces  which  store  error  correcting  bits  with  every  memory  line.  These  error  correcting 
bits  allow  for  any  single  bit  error  to  be  corrected  and  any  two  bit  error  to  be  detected. 

3.3  Experiment 

This  section  summarizes  the  test  framework  including  the  DUT,  the  radiation 
experiment  and  the  plan  for  the  experiments  run  at  the  OSU  reactor. 

3.3.1  Device-Under-Test 

The  FPGA  devices  under  test  for  these  radiation  experiments  are  Xilinx  Virtex  4 
Mini- modules  mounted  on  an  Avnet  Mini- Module  Baseboard,  pictured  below  in  Figure 
3.3.  The  baseboard  contains  a  socket  for  the  two  -  2  x  32  2mm  FPGA  headers  to  connect 
to  multiple  I/O  interfaces  and  power  supplies  on  the  baseboard.  The  entire  baseboard  and 
FPGA  Mini- Module  combination  is  useful  due  to  its  dimensions  being  less  than  4  by  6 
inches,  which  is  very  suitable  for  this  particular  radiation  experiment. 
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Figure  3.3:  DUT  Virtex 4 Mini-Module 

Additional  features  of  the  Mini- module  include  autonomous  operation  without  the 
baseboard,  which  could  be  extremely  useful  for  additional  experimental  setups.  The 
Xilinx  V irtex  4  Mini-  Module  is  designed  as  a  complete  system  on  a  module.  The  Mini  - 
Module  packages  all  the  necessary  functions  needed  for  an  embedded  FPGA  onto  a  tiny 
footprint.  The  on-board  MicroBlaze  core  provides  processing  capabilities,  while  the 
configurable  I/O  settings  offer  versatile  interlace  options. 

The  FPGA  contained  on  the  mini- module  is  a  Virtex  4  SC4VFX12,  referred  to  as 
Virtex4  FX12,  FPGA.  The  Virtex4  FX12  FPGA  contains  an  array  of64  x24  logic 
blocks  supporting  a  maximum  of  86  kilobits  (Kb)  of  distributed  memory  plus  the  separate 
80  Kilobytes  (KB)  (or  640  Kb)  ofblock  ram.  The  FPGA  contains  90nm  transistor 
technology  with  10  layers  of  metal  interconnects  and  triple  oxide  technology  running 
internally  at  1.2  Volts  (V). 

The  distributed  memory  is  contained  in  the  logic  slices  of  the  FPGA  and  therefore 
consumes  resources  that  could  be  used  for  other  logic  on  the  FPGA.  The  maximum  width 
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of  the  distributed  memories  for  the  Virtex  4  is  1024,  however,  to  maximize  slice 
utilization  smaller  sizes  are  used  and  can  be  placed  without  causing  routing  problems 
when  ISE  places  the  logical  slices  on  the  FPGA.  If  larger  memory  sizes  are  needed,  the 
block  ram  slices  are  generally  more  suitable  as  they  are  contained  on  separate  slices  that 
are  only  used  for  memory  storage  and  therefore  can  be  in  a  single  memory  structure  as 
large  as  80  KB  (Xilinx  Inc). 

The  triple  oxide  technology  involves  using  three  different  gate  oxide  thicknesses 
in  order  to  increase  speed  internally  while  still  allowing  the  I/O  at  3.3  V  and  the  slower 
core  logic  containing  the  configuration  data.  The  thick  gate  oxide  is  designed  to 
withstand  at  least  3.6  V  from  the  I/O  transistor  interlace.  The  middle  oxide  or  mid-ox 
thickness  is  for  core  logic  that  does  not  need  to  be  fast  and  therefore  the  increased  oxide 
thickness  saves  FPGA  power.  The  main  use  of  mid-ox  is  for  the  millions  of  transistors 
that  store  the  configuration  (six  transistors  for  each  configuration  bit).  Giving  these 
transistors  a  thicker  gate  oxide  reduces  their  leakage  current  substantially  (Xilinx  Inc). 

3.3.2  Hardware  setup 

In  addition  to  the  DUT  explained  above,  the  hardware  setup  used  to  analyze  the 
data  from  the  DUT  consists  of  the  several  parts.  These  parts  include  an  MF506  FPGA 
board,  two  break-out  boxes,  two  Fluke  Multi- meters  with  data  logging,  a  laptop 
computer,  and  associated  wires.  Additionally,  Agilent  Digital  Fogic  Analyzers  and 
Oscilloscopes  are  used  in  DUT  design,  analysis,  and  testing  but  not  at  the  OSU  irradiator. 
Figure  3.4  describes  the  hardware  setup  and  Figure  3.5  shows  the  equipment  connected  in 
the  lab. 
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Figure  3.4:  Hardware  Setup 


Controller  Board  Device  Under  Test  Board 


Figure  3.5:  Picture  of  Hardware  Setup 
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Figure  3.6:  Virtex  5  ML505/6  Control  Board 

The  ML506  FPGA  board  pictured  in  F igure  3.6  is  utilized  for  analysis  and  display 
of  results  to  the  laptop  computer.  The  specific  board  hardware  used  includes  32  singled 
I/O  header  connections,  the  FPGA,  pushbuttons,  serial  port  and  LEDs.  This  board  is 
utilized  since  it  contains  an  FPGA  with  a  built-  in  MicroBlaze  microprocessor  core, 
allowing  for  programming  in  C++  in  addition  to  VHDL. 

The  breakout  boxes  consist  of  wiring  to  connect  the  single-ended  inputs  of  the 
DUT  and  the  control  board,  shown  in  Figure  3.7.  Details  of  wiring  attachments  are 
contained  in  Appendix  A.  The  breakout  boxes  utilize  RJ45  jacks  to  connect  eight  15’ 
Cat5  Ethernet  cables  to  the  send  data  to  and  from  the  DUT. 
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These  breakout  boxes  effectively  transmit  the  single  ends  signals  in  excess  of  15 
ft.  However,  some  noise  and  approximately  25  ns  of  signal  delay  occur  during 
transmission.  This  results  in  signals  synchronization  issues  during  testing  and  which  are 
remedied  by  the  having  individual  signals  sent  on  the  positive  edge  of  the  clock  not  being 
read  until  the  negative  edge  of  the  clock,  approximately  150  ns  after  transmission  of  the 
data.  An  example  of  the  clock  signal  at  the  DUT  and  control  board  I/Os  is  displayed  in 
Figure  3.8. 

The  laptop  for  the  programming  the  boards  and  collecting  the  data  is  a  Dell 
Latitude  D830.  The  laptop  contains  HyperTerminal  software  for  communication  with 
the  MicroBlaze  processor  on  the  FPGA  and  Xilinx  software  necessary  for  programming 
the  FPGA  Boards. 

In  addition  to  the  equipment  used  for  the  radiation  testing,  an  Agilent  Logic 
Analyzer  is  utilized  to  view  the  signals  transmitted  between  the  DUT  and  control  board. 
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Figure  3.8:  Clock  Pin  Transmission  from  DUT  to  Control  Board  (Scale:  20ns/  horizontal, 

IV/  vertical) 

This  device  allows  analysis  of  data  synchronization  and  functionality  of  the  DUT 
both  before  and  after  the  irradiations.  An  example  of  the  data  from  the  logic  analyzer  is 
shown  in  Figure  3.9.  The  figure  shows  how  the  data  during  the  positive  clock  cycle  is 
noisier. 

3.3.4  Software  Setup 

Two  separate  software  setups  are  made  for  the  radiations  tests  with  the  software 
for  both  the  DUT  and  the  controller  board  programmed  in  VHDL  utilizing  Xilinx  ISE 
software.  However,  the  control  board  files  are  transferred  over  to  Xilinx  XPS  as  an  IP 
core,  such  that  the  MicroBlaze  microprocessor  core  can  be  utilized  for  display  and 
analysis  of  results.  The  actual  code  utilized  is  on  stored  as  Appendix  B. 
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Figure  3.9:  Data  Synchronization  with  Clock 

3.3.4.1  TMR  Software  Setup 

The  first  software  setup  consists  of  DUT  code  to  test  triple  modular  redundancy 
versus  the  three  different  designs  of  adders.  This  code  utilizes  the  29  single  ended  I/Os 
on  the  DUT  board  to  receive  and  to  send  out  adder  and  TMR  results  at  a  frequency  of  at 
most  3.24MHz.  The  maximum  frequency  is  determined  by  observing  line  delays  and 
noise  on  the  single-ended  I/O  lines.  This  indicates  that  the  line  has  noise  induced  errors 
for  up  to  50  ns  after  transmission  indicating  the  clock  needs  to  be  run  at  10MHz  or  slower 
in  order  to  capture  data  during  the  second  half  of  the  clock  cycle.  The  system  is  tested 
with  various  frequencies  divisors  based  on  the  original  100MHz  DUT  clock.  This 
resulted  in  a  minimum  frequency  divisor  to  capture  data  without  capturing  erroneous  data 
of  32.  This  indicates  that  the  logic  analyzer  recognizes  one  and  zero  value  transitions 
differently  than  the  control  board. 
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An  illustration  of  the  code  with  the  three  adders  and  the  FTMR  outputting  data  is 
shown  in  Figure  3.10.  An  illustration  of  the  code  used  to  compare  different  TMR 
structures  with  the  FMTR  structure  is  shown  in  F  igure  3.11. 

The  FTMR/Adder  DUT  code  is  paired  with  controller  board  code  that  is  built  to 
compare  the  results  to  the  truth  source  produced  on  the  controller  board  and  display  the 
results  through  the  UART  connection  to  the  PC  containing  HyperTerminal  software. 

The  comparator  software  operates  at  the  same  frequency  as  the  DUT  board  clock  and  the 
comparator  then  outputs  data  to  registers  for  display  by  the  MicroBlaze  processor.  The 
processor  operates  at  235  MHz  on  the  Virtex  5  board  and  analyzes  and  controls  the 
display  of  data  contained  in  the  registers.  The  data  is  then  displayed  to  the  user  via  the 
RS232  Serial  Communication  IP  Core  running  at  9600  bps.  This  means  that  if  errors  are 
occurring  faster  than  can  be  displayed  by  the  serial  communication,  the  data  is  lost. 
Therefore,  error  totals  are  displayed  in  order  to  identify  errors  that  are  not  displayed. 

This  means  that  off-chip  voting  must  be  done  real-time  on  the  control  board  with  running 
totals  of  errors  since  post  analysis  may  not  be  possible  if  multiple  errors  occur.  A 
diagram  of  this  structure  is  shown  in  Figure  3.12. 
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Figure  3.10:  Virtex  4  FPGA  FTMR/ Adder  Structure 


Figure  3.11:  Virtex  4  FPGA  TMR/FTMR  Structure 
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Figure  3.12:  Virtex  5  FPGA  FTMR/TMR/ Adder  Analysis  Structure 

The  total  resource  allocation  for  each  functional  adder  in  comparison  to  the  TMR 
structure  is  important  to  analyzing  the  potential  radiation  damage.  The  DUT  utilization 
summary  is  contained  in  the  Table  3.1.  These  numbers  reflect  the  FPGA  components 
used  but  do  not  reflect  the  actual  structure  of  these  components. 

Table  3.1:  Utilization  of  Several  Functional  Units 


Module 

slices 

Slice  Regs 

LUTs 

RC 

3 

4 

3 

Behav 

4 

4 

12 

CLA 

3 

4 

3 

FTMR 

12 

6 
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The  designs  of  the  ripple  carry  and  carry  look  ahead  adders  are  shown  in  Figures 
3.13  and  3.14,  where  a  and  b  are  the  adder  inputs  and  S  and  C  are  the  Sum  and  Carry 
outputs,  respectively.  The  behavioral  adder  is  not  shown  since  it  is  created  by  the  Xilinx 
synthesis  tool  and  is  essentially  a  black  bo  x  with  A  and  B  inputs  and  S  and  C  outputs. 

The  actual  structures  of  the  LUTs  used  for  each  structure  can  also  be  viewed 
through  use  of  the  ISE  tools.  The  logic  used  for  the  first  bits  of  each  adde  r  is  shown  in 
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Figure  3.13:  Ripple  Carry  Adder  Structure 
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Figure  3.14:  Carry  Look  Ahead  Adder  Structure 
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Figures  3.15,  3.16,  and  3.17.  These  figures  represent  only  a  small  section  of  the  devices 
with  the  carry  look  ahead  adder  also  having  additional  logic  for  the  propagate  and 
generate  functions. 
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Figure  3.16: 1st  Bit  of  Behavioral  Adder 
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Figure  3.17: 1st  Bit  of  Carry  Look  ahead  Adder 
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3.3.4.2  ECC  Memory  Software  Setup 

The  second  software  setup  is  built  to  test  the  radiation  effects  on  the  FPGA 
containing  various  memory  structures.  The  four  structures  included  32  KB  of  block  ram, 
20  KB  ofBRAM  with  ECC,  up  to  8  KB  of  distributed  memory,  and  up  to  13  KB  of 
distributed  memory  with  ECC.  These  four  structures  are  chosen  since  they  maximize  the 
available  block  ram,  80KB,  and  available  distributed  memory,  up  to  86  Kb.  The  memory 
units  are  each  addressed  separately  with  the  entire  memory  structure  being  read  or  written 
to  every  0.0202  seconds.  This  means  only  one  memory  address  is  read  per  clock  cycle 
to  avoid  collisions  between  data  being  sent  out  by  different  memory  units.  The  addresses 
of  the  structures  are  shown  in  Table  3.2  below.  The  memory  is  loaded  with  equal 
sections  of  four  different  hex  memory  patterns,  00,  FF,  55,  andC3.  These  hex  patterns 
translate  to  binary  00000000,  11111111,01010101,  and  1 1000011.  These  are  chosen  to 
determine  if  the  different  memory  values  and  patterns  are  more  susceptible  to  radiation 
induced  SEUs. 

In  addition  to  the  error  check,  the  memory  structure  runs  a  stuck  bit  check  which 
takes  3  lead/write  cycles.  The  stuck  bit  check  first  loads  the  negated  memory  values  into 
each  memory  address  during  one  write  cycle.  Then  it  checks  the  memory  during  the 
lb llowing  memory  cycle.  Then  the  check  reloads  the  original  pattern  back  into  memory. 
The  frequency  of  the  stuck  bit  check  is  once  every  1021  read  cycles,  however  this  can 
easily  be  altered  or  removed  if  stuck  bits  do  not  show  up  in  the  data  during  radiations. 

The  structure  of  the  DUT  board  is  shown  in  Figure  3.18.  For  each  memory 
address,  the  board  outputs  the  corresponding  memory  information  and  checks  it  with  the 
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Figure  3.18:  Virtex  4  FPGA  Memory  Structure 

expected  value  for  that  location.  For  ECC  memories,  this  means  that  DUT  checks  the  13 
bit  encoded  value.  Any  discrepancies  are  reported  to  the  controller  board  as  8  bit  data 
plus  the  error  correction  code.  For  the  ECC  portion  of  memory,  the  data  is  checked  for 
errors  while  it  is  still  encoded.  Then  if  there  are  errors,  it  is  decoded  for  transmission  back 
to  the  control  board.  This  is  do  ne  so  that  errors  that  are  corrected  by  the  ECC  could  be 
seen  to  evaluate  the  performance  of  the  ECC.  Additionally  when  errors  are  detected,  the 
correct  value  is  rewritten  into  the  memory  address  to  attempt  to  fix  the  errors. 

The  controller  board  software  structure  combined  with  this  memory  DUT  software 
structure  contains  the  logic  to  take  all  the  memory  errors  and  displays  the  total  number  of 
errors  differentiating  between  the  original  patterns  and  the  inverted  patterns  when  they 
are  loaded  into  each  section  of  the  memory.  Additionally,  when  an  error  occurs,  the 
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control  board  outputs  the  data  with  a  time  stamp  based  on  the  DUT  clock.  However,  if 
errors  at  multiple  addresses  occur  faster  than  the  data  can  be  captured  by  the  UART 
running  at  9600bps,  the  actual  error  is  not  be  displayed  and  only  the  error  count  can  be 
used  for  analysis.  This  method  o  ffers  a  glimpse  at  the  error  data  but  primarily  counts 
total  errors  in  the  memory  structure  for  purpos  e  of  analysis.  To  account  for  this  known 
system  limitation  the  data  is  fixed  by  the  DUT  when  an  error  is  found  and  the  control 
board  records  how  many  single  bit  and  multiple  bit  errors  occurred  in  each  memory 
structure.  Figure  3.19  shows  the  structure  of  the  control  board  IP  Cores,  specifically  the 
test  IP  core  consisting  of  the  VHDL  code  that  captures  the  data  from  the  DUT. 


Figure  3.19:  Virtex  5  Memory  Analysis  Structure 
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Table  3.2:  Memory  Structure  Addresses 


Memory  Structure 

Addresses 

Base  Address 

End  Address 

BRAM 

0-16k 

0000000000000000 

0011111111111111 

BRAM  ECC 

16-32k 

0100000000000000 

0111111111111111 

Dist.  RAM 

32-40k 

1000000000000000 

1001111111111111 

Dist.  RAM  ECC 

40-48k 

1010000000000000 

1011111111111111 

BRAM 

48-64k 

1100000000000000 

1111111111111111 

3.4  Test  Plan 

Radiation  analysis  is  conducted  o  n  a  maximum  of  1 0  FPGA  Mini- modules.  The 
initial  test  is  run  at  50  krad  (Si)/hr  dose  rate.  This  dose  rate  is  expected  to  cause  data 
errors  and  board  failure  based  on  research  with  previous  generations  ofFPGAs  (Wang, 
Katz  and  Cronquist).  However,  since  no  data  is  obtained  on  Virtex  4  FPGAs  or  the 
packaged  FPGAs  such  as  the  Virtex  4  Mini- module,  the  first  radiation  determines  the 
radiation  dose  rates  for  additional  runs.  In  addition  to  the  error  data  collected  for 
analysis,  current  and  voltage  data  is  also  recorded  in  order  to  predict  expected  errors  and 
device  failure  on  future  runs.  This  ideally  allows  for  devices  to  be  powered  off  or 
removed  from  the  source  before  anypermanent  damage  occurs.  The  analysis  beginning 
with  the  FTMR/ Adder  structure  contains  the  most  variety  oftest  structures  for  analysis. 
This  test  compares  three  designs  of  adders  to  a  TMR  system  that  votes  on  the  outputs  of 
all  three  adders.  This  test  shows  which  designs  are  the  most  robust  adder  designs  and 
shows  the  performance  improvement  of  both  on-chip  and  off  chip  FTMR  structures. 
Next,  the  FTMR/TMR  structure  is  radiated  to  evaluate  the  improvements  of  the  FTMR 
structure  compared  to  traditional  TMR  structures.  Finally,  the  memory  structure  tests  are 
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run  to  evaluate  the  radiation  expos  ure  on  a  larger  portion  of  the  FPGA,  as  well  as, 
evaluate  the  improvements  of  ECC  memory  over  non-ECC  memory  in  both  BRAM  and 
distributed  memories.  These  tests  are  lowest  priority  since  the  errors  observed  by  the 
control  board  are  difficult  to  trace  since  the  errors  could  be  caused  by  the  additional  logic 
on  the  DUT  that  looks  like  errors  in  the  actual  memory  addresses  tested. 

3.4.1  Data  Format 

The  data  output  from  the  control  board  contains  data  that  is  used  to  characterize 
radiation  effects  on  the  various  hardening  by  design  techniques.  There  are  two  basic 
formats  of  data  that  is  output  depending  on  the  experiment  being  run.  The  first  type  of 
data  is  for  the  adder  and  TMR  data.  The  second  type  of  data  contains  address  and  data 
information  from  the  memory  system  Both  sets  of  data  include  a  system  clock  for 
analysis.  All  data  values  are  in  Hexadecimal,  which  shorts  the  size  of  the  output  line, 
thus  increasing  the  amount  of  data  that  can  be  displayed. 

The  adder  TMR  data  is  listed  Table  3.3.  The  data  shows  each  of  the  four 
functional  units  with  the  current  status  compared  to  the  truth  value  of  that  unit,  followed 
by  the  error  count  of  the  function  unit.  The  functional  units  are  followed  by  the  current 
status  and  error  counts  of  the  three  different  types  of  counter  being  tested.  This  data  is 
shorter  than  the  adder  TMR  data  since  the  counter  outputs  are  not  sent  to  the  control 
board  but  instead  the  error  code  produced  by  the  majority  voter  on  the  counters  is  sent  to 
the  control  board.  Similarly,  the  data  for  the  FTMR/TMR  irradiations  contains  the  four 
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Table  3.3:  TMRData  Collection  Format 


Example  Output  Data 

rcEEWO  behEEWO  claEE#F  ftmrEE#Ofcnt#u  clk5DDEFQ5 
rcEE#0  behEE#0  claEE#F  ftmrEE#0  cntrQ:0:0:0  clk5DDEF05 


DUTData  sentfrom  each  Device 
Control  Board  Truth  Data 
Cumulative  Error  Counts  for  each  Device 
3.24  MHz  Clock  Counter 

rc=  lil  Device  on  DUT 

beh=  2nd  Device  on  DUT 

cla=  3rd  Device  on  DUT 

ftmr=  4th  Device  on  DUT 

fcnt=  Control  Board  TM  R  voting  on  Devices  1-3 

cntr=  TMR  error  code  output  from  counterTMR  on  DUT 


functional  units  with  outputs  displayed  in  the  same  format.  This  is  followed  by  the 
counter  data  tested. 

The  format  of  the  memory  software  structure  includes  address,  memory  data, 
clock  and  cumulative  errors  for  each  of  the  four  memory  types.  The  cumulative  errors 
are  necessary  since  the  serial  communication  to  the  PC  is  limited  to  9600  bps  and 
therefore  only  some  data  errors  is  displayed  if  they  are  sent  to  the  control  board  while 
another  error  is  being  displayed. 


3.5  Methodology  Summary 

The  system  is  built  to  test  the  radiation  effects  on  the  Virtex  4  FPGA.  The  DUT 
for  this  analysis  is  a  Virtex  4  Mini-module.  Section  3.2  describes  the  hardware  setup 
with  the  control  board  that  provides  inputs,  analysis  and  data  display  to  a  PC.  The  two 
test  setups  for  analysis  are  described  in  Section  3.3.  They  include  a  memory  and  ECC 
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memory  setup  and  a  TMR  setup.  Section  3.4  describes  the  test  plan  for  the  irradiations  at 
OSUNRL. 
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IV.  Results  and  Analysis 


4.1  Chapter  Ove  rvie  w 

This  chapter  covers  the  following  material: 

1.  Radiations  Summary 

2.  Current  Draw  during  Radiation  Testing 

3.  FTMR/TMR/ Adder  Results 

4.  Memory  Results 

The  raw  data  is  in  Appendix  C,  while  the  detailed  results  of  each  radiation  run  is  in 
Appe  ndix  D. 

4.2  Radiations  Summary 

The  analysis  invo  Ives  eight  irradiations.  The  radiations  all  use  new  Virtex  4 
Mini- modules  which  are  tested  in  the  lab  prior  to  radiation  and  operate  satisfactorily 
without  producing  any  errors.  Radiation  #2  is  not  complete  because  the  current  draw 
reached  the  maximum  allowed  by  the  Agilent  Power  Supply  and  therefore  is  terminated 
early.  This  meant  that  for  the  remaining  runs  a  new  power  supply  is  used.  Radiation  #3 
also  did  not  produce  a  total  ionizing  dose  for  failure  since  it  is  removed  while  the  FPGA 
DUT  is  still  operating.  Figure  4.1  and  Table  4.1  summarize  the  radiation  dose  rates  and 
time  to  failure  for  each  run. 
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Dose  Rate  |Rrad(Si}/hr) 


Co-60  Irradiator:  March  13, 2009 


Figure  4.1  Radiations  1-8:  Shows  krad(Si)/hr  vs.  Position  in  Tube 


Table  4.1 :  Summary  of  Radiations 


Radiation  # 

Dose  Rate 
(krad(Si)/hr) 

Radiation 

Time 

Time  to 
Failure 

Total 

Ionizing 

Dose 

Test  Run 

1 

50 

0:19:17 

0:19:17 

16.71 

FTMR/Adder 

2 

35 

0:36:30 

N/A 

21.29 

FTMR/Adder 

3 

35 

2:57:24 

N/A 

103.48 

FTMR/Adder 

4 

67 

0:31:02 

0:31:02 

34.65 

FTMR/Adder 

5 

35 

2:42:45 

2:42:45 

94.94 

FTMR/Adder 

6 

50 

1:06:55 

1:06:55 

57.99 

TMR/FTMR 
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A  summary  of  the  different  software  codes  is  described  in  Tables  4.2,  4.3,  and 


4.4.  Table  4.2  shows  the  code  version  that  is  used  on  each  run.  Tables  4.3  and  4.4 
describe  the  differences  in  each  code  setup  used.  DUT  code  version  2.2  included  a 
replacing  the  4  bit  counter  TMR  result  code  with  the  carryouts  from  each  of  the  4 
functional  units  being  tested.  This  is  done  to  get  a  better  understanding  of  the  effects  on 
the  whole  adder  units. 

Table  4.2:  Software  Configurations  for  each  Radiation 


Radiation  # 

Date  Tested 

Control  Board  Version 

DUT  code  Version 

Outputs  of  DUT 

1 

1/23/2009 

1 

1 

FTMR/3  Adders/Cntr  TMR 

2 

2/25/2009 

2 

2 

FTMR/3  Adders/Cntr  TMR 

3 

2/25/2009 

2 

2 

FTMR/3  Adders/Cntr  TMR 

4 

3/2/2009 

3 

2 

FTMR/3  Adders/Cntr  TMR 

5 

3/2/2009 

3 

2 

FTMR/3  Adders/Cntr  TMR 

6 

3/2/2009 

3 

2.1 

FTMR/3  TMRs/Cntr  TMR 

7 

3/13/2009 

4 

2.2 

FMTR/3  Adders 

8 

3/13/2009 

4 

2.3 

FMTR/CLA  Adders 

Table  4.3 :  DUT  Code  Versions 


DUT  Code  Versions 

Outputs 

Change 

1 

CLA,  Behavioral,  RC,  FTMR 

Original  Code 

2 

CLA,  Behavioral,  RC,  FTMR,  Counter  TMR  data 

2.1 

3  TMRs(CLA,  Behavioral,  RC),  FTMR,  Counter  TMR  data 

2.2 

CLA,  Behavioral,  RC,  FTMR 

|counterTMR  data  replaced  by  Carry  outs  of  adders  and  FTMR  | 

2.3 

Table  4.4:  Control  Board  Code  Versions 


Control  Code  Version 

Change 

1 

Original  working  code 

2 

Change  communication  between  boards  to  make  it  into  a  robust  design  without  errors  and 
added  counter  TMR  to  additional  output  lines 

3 

Minor  modifications  to  output  format  to  include  regular  clock  updates  at  10  sec  intervals  and 
error  counts  at  2  min  intervals 

4 

Setup  to  receive  and  compare  5  bit  inputs  from  DUT,  also  analyzes  data  in  a  control  board  TMR 
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The  results  ofthe  radiations  indicate  that  the  total  ionizing  dose  to  cause  device 
failure  on  the  Virtex  4  Mini-modules  is  difficult  to  predict.  In  fact  the  largest  dose  rate 
in  Radiation  #4  experienced  tailure  at  nearly  twice  the  total  ionizing  dose  that  caused 
radiation  #1  to  fail.  This  is  likely  the  result  of  the  code  revision  done  after  radiation  #1 
but  could  also  be  a  result  of  variations  in  the  modules  that  are  used  for  analysis. 
Additionally,  these  variations  could  be  the  result  of  variations  in  placement  of  the  device 
within  the  gamma  irradiator  itself.  The  position  could  only  be  controlled  in  the  vertical 
axis  but  the  actual  rotation  ofthe  device  in  the  gamma  cell  is  not  controllable.  Therefore, 
an  analysis  ofcurrentdraw  versus  device  failure  and  SEUs  is  shown  in  Section  4.3 . 

4.3  Current  Draw 

The  current  draw  of  the  FPGAs  indicates  the  amount  that  leakage  current  in  the 
FPGA  increases  due  to  the  EHPs  described  in  Section  2.  It  indicates  when  SEU  might 
cause  incorrect  values  to  be  recorded.  This  current  is  used  to  determine  at  what  point  an 
SEB  would  be  expected  to  permanently  damage  the  device  under  radiation.  Figures  4.2 
and  4.3  show  the  current  during  each  irradiation  for  all  6  radiations.  These  tables  are 
divided  between  the  35  krad  (Si)/hr  radiations  and  the  higher  dose  rates,  so  that  trends  are 
easily  observed. 
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FPGA  Supply  Current  Vs.  Time 


Figure  4.2:  FPGA  Supply  Current  vs.  Time  at  35  krad  (Si)/hr 

Figure  4.2  shows  similar  current  draw  for  35  krad  (Si)/hr  radiations.  Both 
Radiations  5  and  8  experienced  enough  radiation  effects  to  experience  device  failure.  As 
discussed  above  radiation  2  and  3  did  not  result  in  device  failure  before  the  DUT  is 
removed  fromradiatioa  However,  the  devices  did  not  experience  similar  maximum 
currents  prior  to  shutdown  as  expected  from  Wang  et  al  discussed  in  Chapter  2.  Based  on 
these  results,  it  is  difficult,  if  not  impossible,  to  determine  device  failure  based  on  current 
draw  alone.  One  explanation  for  this  result  is  the  feet  that  the  FPGA  is  integrated  into  the 
device  unlike  previous  FPGAs  which  are  plugged  into  integrated  circuit  sockets. 


42 


Therefore  the  current  analysis  may  reveal  more  consistent  results  prior  to  failure  by 
measuring  the  three  FPGA  power  supplies  (1.2  V,  2.5V  and  3.3  V)  as  they  connect  directly 
to  the  FPGA  as  opposed  to  the  current  draw  through  the  constant  voltage  5  V  power 
supply  which  is  measured. 

Figure  4.3  shows  the  currents  of  the  three  higher  radiations.  As  discussed  in 
Chapter  3,  the  expected  dose  rate  for  maximum  errors  is  not  known  prior  to  testing. 
Therefore,  radiation  1  is  used  as  a  baseline.  However,  after  the  device  used  in  Radiation 
#3  did  not  fail  and  produced  limited  SEU  errors  after  nearly  3  hours  with  a  dose  rate  of 
35  krard  compared  to  the  50  krad  of  Radiation#l,  a  higher  radiation  dose  rate  is  used  in 
Radiation  #4 .  Thus  Radiation  #4  resulted  in  device  failure  faster  than  expected. 


FPGA  Current  Supply  Vs.  Time 


Figure  4.3 :  Supp  ly  Current  Vs  Time  at  50  krad  (Si)/hr  and  67  krad  (Si)/hr 
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Radiation  #6  is  conducted  to  re-evaluate  the  50  krad  (Si)/hour  dose  rate  and 
showed  that  some  other  effects  is  occurring  in  Radiation  #1  that  didn’t  occur  in  the  other 
radiation.  Therefore,  for  the  analysis,  Radiation  #1  data  is  being  considered  incomplete 
and  Radiations  #4  and  #6  are  assumed  to  be  the  estimated  radiation  TID  for  failure. 
Radiation  #7  also  did  not  experience  TID  failure  and  therefore  it  is  also  being  excluded 
for  this  purpose. 


4.4  FTMR/Adder  Error  Results 

FTMR  and  each  functional  adders  error  results  are  in  the  Table  4.5.  The  data 
shows  that  the  FTMR  circuit  experienced  the  most  errors  at  the  output  of  the  DUT.  This 
result  is  initially  surprising  based  on  the  expected  improvements  of  a  TMR  circuit  over  a 
single  copy  of  a  circuit.  However,  when  analyzing  the  device  utilization  of  each 
component,  the  FTMR  circuit  utilization  is  higher  than  any  of  the  adders  by  themselves. 
Based  on  the  results  of  the  above  test,  an  analysis  of  using  traditional  TMR  designs  of  the 
3  different  functional  units  is  done.  This  implementation  is  done  to  verify  whether  a 
single  copy  TMR  design  can  outperforms  the  FTMR.  The  TMRs  were  based  on  using  3 
identical  copies  for  each  TMR.  FTMR  vs.  TMR  analysis  error  results  are  in  the  Table 
4.6. 


Table  4.4:  Single  Adder  vs.  Single  FTMR  Errors  (*  partial  radiations) 


Radiation  # 

Dose  Rate 
(krad(Si)/hr) 

RC  Adder 
Errors 

Behavioral 

Adder 

Errors 

CLA  Adder 
Errors 

FTMR  Errors 

2* 

35 

0 

0 

0 

1 

3* 

35 

0 

0 

i 

3900 

4 

67 

4260 

0 

6124 

6657 

5 

35 

1160759 

1454061 

1618623 

1532825 
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Table  4.5 :  TMR  vs.  FTMR  Errors 


Radiation  # 

Dose 

Rate(krad(Si)/hr) 

RC  TMR 
Errors 

Behavioral 

TMR  Errors 

CLA  TMR 
Errors 

FTMR  Errors 

6 

50 

6293004 

4763530 

5354154 

4763530 

The  results  of  comparing  various  TMR  units  with  single  triplicated  functional 
units  reveal  that  the  FTMR  design  can  produces  results  worse  than  the  simple  single 
functional  unit  TMRs.  This  indicates  that  the  likelihood  of  errors  on  any  functional 
component  increases  proportionally  to  other  functional  units.  Therefore,  these  results 
indicate  that  an  FTMR  circuit  doe  s  not  produce  increased  protection  as  hypothesized  in 
Chapter  3. 

For  further  analysis  of  the  data,  the  cumulative  errors  over  time  for  radiation  #  4, 
5,  &  6  are  shown  in  F igures  4.4,  4.5,  and  4.6.  These  figures  show  the  errors  just  prior  to 
device  failure  since  the  errors  prior  to  this  point  are  extremely  rare  and  appear  to  vary 
randomly  when  compared  to  the  results  produced  as  the  FPGA  reaches  its  failure  point. 
These  graphs  end  when  either  the  clock  signal  stopped  transmitting  to  the  control  board 
or  where  the  correct  results  being  output  to  the  controller  board  stopped  being  sent  for  an 
entire  display  cycle  through  the  serial  communication. 
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Cumlative  Errors  vs.  Time 


Seconds 


Figure  4.4  Errors  on  Radiation  #4  67krad  (Si)/hr 
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Figure  4.5 :  Error  on  Radiation  #5  35  krad  (Si)/hr 
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Cumlative  Errors  vs.  Time 
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Figure  4.6:  Errors  on  Radiation#6  50  krad  (Si)/hr 

These  three  figures  show  the  majority  of  errors  just  prior  to  device  failure.  This 
results  in  a  major  limitation  of  this  type  of  testing.  In  fact  less  than  1000  total  errors, 
across  all  functional  units,  occur  prior  to  the  minute  before  device  failure  on  all  8 
radiations. 

Additional  analysis  of  the  results  indicates  that  the  ripple  carry  adder  has  the  best 
performance  of  the  three  types  of  adders  analyzed.  This  result  was  actually  better  than 
the  result  of  the  single  voter  FTMR  implemented. 


4.6  Results  Summary 

The  results  show  the  single  voter  FTMR  device  does  not  perform  as  well  as 
traditional  TMR  circuits  built  with  single  copies  of  more  robust  adders.  This  shows  that 
the  cycles  that  have  error  occurrences  on  different  functional  units  has  less  to  do  with 
structure  and  more  to  do  with  the  EHP  in  the  individual  structures.  Additionally,  the 
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FTMR  circuit  performed  worse  than  single  devices  outputting  data  to  the  control  board  in 
almost  all  cases.  However,  when  the  FTMR  is  moved  off  chip,  the  results  are 
significantly  improved.  This  indicates  that  the  single  point  of  failure  of  a  TMR  circuit 
should  be  mitigated  by  one  of  the  techniques  discussed  in  Chapter  2. 
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V.  Conclusions 


5.1  Overview 

This  chapter  covers  the  following  material: 

1 .  A  basic  conclusion  statement 

2.  Applications 

3.  Future  Studies 

5.2  Conclusion  Statement 

Radiation  effects  produces  SEUs  and  device  failure  in  Virtex  4  Mini- modules 
allowing  characterization  of  the  hardened  by  design  components.  Single  Voter  TMR  and 
FTMR  structures  placed  on  the  DUT  experiences  error  rates  as  large  as  the  single  units 
tested. 

5.3  Applications 

The  results  show  single  voter  TMR  designs  do  not  necessarily  improve  design 
hardness  when  the  TMR  design  is  also  radiated.  The  TMRs  placed  after  three 
functionally  different  combinational  adders  actually  had  worse  performance  results  than 
that  of  the  behavioral  adder  that  Xilinx  defaults  to.  These  results  are  mainly  caused  by 
the  differences  in  device  structure  causing  more  space  to  be  utilized  and  therefore  more 
errors  to  be  produced. 
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5.4  Future  Work 


Possible  areas  of  future  study  include  analysis  of  more  advanced  hardening  design 
techniques  such  as  those  discussed  in  Chapters  2  and  3.  These  techniques  for  TMR 
include  using  word- wise  TMR  voting  or  developing  a  buffer  based  TMR  implementation. 
Alternatively,  more  robust  modularly  redundant  designs  could  be  proposed  that  would 
limit  the  effects  of  SEUs  on  the  FPGAs. 

Another  possible  area  of  study  is  to  compare  varying  module  sizes  between 
triplicated  TMRs.  This  could  be  used  to  optimize  the  placement  of  voting  logic  to 
maximize  error  protection  while  minimizing  additional  size  overhead  caused  by  the 
voting  logic. 

Additionally,  further  development  of  this  testing  methodo  logy  could  be  done  to 
eliminate  possible  errors  that  could  have  occurred  based  on  the  stress  placed  on  the  clock 
signals.  This  would  require  significant  numbers  of  tests  to  evaluate  performance  of  the 
FPGA  boards  in  the  radiation  environments. 

Finally,  these  structures  could  be  implemented  as  gate  level  devices  across 
multiple  slices  in  the  FPGA.  This  would  increase  the  FPGA  utilization  size  and 
potentially  will  allow  analysis  of  error  locations  by  analyzing  the  signals  in  between  the 
individual  slices 
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